Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis
نویسنده
چکیده
We consider the problem of query-focused multidocument summarization, where a summary containing the information most relevant to a user’s information need is produced from a set of topic-related documents. We propose a new method based on probabilistic latent semantic analysis, which allows us to represent sentences and queries as probability distributions over latent topics. Our approach combines queryfocused and thematic features computed in the latent topic space to estimate the summaryrelevance of sentences. In addition, we evaluate several different similarity measures for computing sentence-level feature scores. Experimental results show that our approach outperforms the best reported results on DUC 2006 data, and also compares well on DUC 2007 data.
منابع مشابه
Personalized Multi-Document Summarization using N-Gram Topic Model Fusion
We consider the problem of probabilistic topic modeling for query-focused multi-document summarization. Rather than modeling topics as distributions over a vocabulary of terms, we extend the probabilistic latent semantic analysis (PLSA) approach with a bigram language model. This allows us to relax the conditional independence assumption between words made by standard topic models. We present a...
متن کاملMulti-document Summarization using Probabilistic Topic-based Network Models
Multi-document summarization has obtained much attention in the research domain of text summarization. In the past, probabilistic topic models and network models have been leveraged to generate summaries. However, previous studies do not investigate different combinations of various topic models and network models. This paper describes an integrated approach considering both probabilistic topic...
متن کاملComparative Summarization via Latent Dirichlet Allocation
This paper aims to explore the possibility of using Latent Dirichlet Allocation (LDA) for multi-document comparative summarization which detects the main differences in documents. The first two sections of this paper focus on the definition of comparative summarization and a brief explanation of using the LDA topic model in this context. In the last three sections, our novel method for multi-do...
متن کاملThe CIST Summarization System at TAC 2011
In this report, we present our extractive summarization system on both summarization and multiling tracks of TAC 2011. We introduce an extractive multi-document summarization method based on hierarchical topic model of hierarchical Latent Dirichlet Allocation (hLDA) and sentence compression. hLDA is a representative generative probabilistic model, which not only can mine latent topics from a la...
متن کاملIntegrating Clustering and Multi-Document Summarization by Bi-Mixture Probabilistic Latent Semantic Analysis (PLSA) with Sentence Bases
Probabilistic Latent Semantic Analysis (PLSA) has been popularly used in document analysis. However, as it is currently formulated, PLSA strictly requires the number of word latent classes to be equal to the number of document latent classes. In this paper, we propose Bi-mixture PLSA, a new formulation of PLSA that allows the number of latent word classes to be different from the number of late...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009